Homework 11: Maps

General instructions for all assignments:

  • Use this file as the template for your submission. Delete the unnecessary text (e.g. this text, the problem statements, etc). That said, keep the nicely formatted “Problem 1”, “Problem 2”, “a.”, “b.”, etc
  • Upload a single R Markdown file (named as: [AndrewID]-HW09.Rmd – e.g. “sventura-HW09.Rmd”) to the Homework 09 submission section on Blackboard. You do not need to upload the .html file.
  • The instructor and TAs will run your .Rmd file on their computers. If your .Rmd file does not knit on our computers, you will be automatically be deducted 10 points.
  • Your file should contain the code to answer each question in its own code block. Your code should produce plots/output that will be automatically embedded in the output (.html) file
  • Each answer must be supported by written statements (unless otherwise specified)
  • Include the name of anyone you collaborated with at the top of the assignment
  • Include the style guide you used below under Problem 0


Easy Problems

Problem 0

(4 points)

Organization, Themes, and HTML Output

  1. For all problems in this assignment, organize your output as follows:
  • Use code folding for all code. See here for how to do this.
  • Use a floating table of contents.
  • Suppress all warning messages in your output by using warning = FALSE and message = FALSE in every code block.
  • Use tabs only if you see it fit to do so – this is your choice.
  1. For all problems in this assignment, adhere to the following guidelines for your ggplot theme and use of color:
  • Do not use the default ggplot() color scheme.
  • For any bar chart or histogram, outline the bars (e.g. with color = "black").
  • Do not use both red and green in the same plot, since a large proportion of the population is red-green colorblind.
  • Try to only use three colors (at most) in your themes. In previous assignments, many students are using different colors for the axes, axis ticks, axis labels, graph titles, grid lines, background, etc. This is unnecessary and only serves to make your graphs more difficult to read. Use a more concise color scheme.
  • Make sure you use a white or gray background (preferably light gray if you use gray).
  • Make sure that titles, labels, and axes are in dark colors, so that they contrast well with the light colored background.
  • Only use color when necessary and when it enhances your graph. For example, if you have a univariate bar chart, there’s no need to color the bars different colors, since this is redundant.
  • In general, try to keep your themes (and written answers) professional. Remember, you should treat these assignments as professional reports.
  1. Treat your submission as a formal report:
  • Use complete sentences when answering questions.
  • Answer in the context of the problem.
  • Treat your submission more as a formal “report”, where you are providing details analyses to answer the research questions asked in the problems.
  1. What style guide are you using for this assignment?
library(tidyverse)
library(data.table)
library(forcats)

#  Simple theme with white background, legend at the bottom
my_theme <-  theme_bw() +
  theme(axis.text = element_text(size = 12, color = "indianred4"),
        text = element_text(size = 14, face = "bold", color = "darkslategrey"))

#  Colorblind-friendly color palette
my_colors <- c("#000000", "#56B4E9", "#E69F00", "#F0E442", "#009E73", "#0072B2", 
               "#D55E00", "#CC7947")


Problem 1

(3 points each)

Read

  1. Read this article. Write 1-3 sentences about what you learned from it.

  2. Read this article. Write 1-3 sentences about what you learned from it.

  3. Read the in-depth description of the ggmap package in the short paper by David Kahle and Hadley Wickham here. Write 1-3 sentences about what you learned from it.

  4. Read the article on ggmap here. Which functions can you use to create geographic heat maps?



Problem 2

(2 points each)

Maps with ggmap

Install and load the ggmap package. This package can be used to access maps from Google’s Maps API.

  1. Look at the help documentation for the get_map() function. What does it do? What are the different map sources that can be used in get_map()?

  2. In the help documentation, describe the zoom parameter. Roughly, what would be an appropriate value of this parameter if we wanted to display a square with width 1 mile? (Just a rough estimate is fine; an exact number is not required.)

  3. In the help documentation, what are the different maptype values that can be used? Which of these is unique to Google Maps?

  4. What does the map in the following code block show? Describe it. Explain what each of the parameters in the get_map() and ggmap() functions are doing.

(Note: Before doing this, you may need to install the most updated versions of these packages from GitHub – see commented code below.)

#devtools::install_github('hadley/ggplot2')
#devtools::install_github('thomasp85/ggforce')
#devtools::install_github('thomasp85/ggraph')
#devtools::install_github('slowkow/ggrepel')
library(ggmap)
map_base <- get_map(location = c(lon = -79.944248, lat = 40.4415861),
                    color = "color",
                    source = "google",
                    maptype = "hybrid",
                    zoom = 16)

map_object <- ggmap(map_base,
                    extent = "device",
                    ylab = "Latitude",
                    xlab = "Longitude")
map_object

  1. Recreate the map in part (d). Try changing the zoom parameter to a non-integer value (e.g. 16.5). What happens?

  2. Type class(map_object). What kind of object is your map?



Problem 3

(2 points each)

Finding Latitudes and Longitudes

There are many ways to find latitude and longitude coordinates of specific places. Here’s one easy way:

  1. Go to Google Maps. Type in “times square, nyc” and hit enter. The map should center around New York City. Now, look at the URL in your internet browser. After the @ symbol, the latitude and longitude of the center of the map are displayed (in order). What is the latitude of the map centered on Times Square? What is the longitude?

  2. After the latitude and longitude, the zoom level is displayed (e.g. “17z”). Change this to zoom level 12, and delete any text to the right. This should give you a map that displays most of New York City. Do the latitude/longitude coordinates change when you do this?

  3. Using the code from Problem 3d as a template, create a black and white (color = "bw") map of NYC in R, centered near Times Square, at a zoom level of 12, and with a roadmap map type. Describe the map that is output in R.



Hard Problems

For both problems below, it may help to look over:

  • The Lecture 20 R Demo on Blackboard
  • The Maps Helper R Code file on Blackboard (thanks to TA Jennifer Jin for this!)

Both files are located under Course Content.

Problem 4

(30 points total)

Mapping US Flights

airports <- read_csv("https://raw.githubusercontent.com/jpatokal/openflights/master/data/airports.dat",
                     col_names = c("ID", "name", "city", "country", "IATA_FAA", 
                                   "ICAO", "lat", "lon", "altitude", "timezone", "DST"))

routes <- read_csv("https://raw.githubusercontent.com/jpatokal/openflights/master/data/routes.dat",
                   col_names = c("airline", "airlineID", "sourceAirport", 
                                 "sourceAirportID", "destinationAirport", 
                                 "destinationAirportID", "codeshare", "stops",
                                 "equipment"))

departures <- routes %>%
  dplyr::group_by(sourceAirportID) %>%
  dplyr::summarize(flights = n()) %>%
  mutate(sourceAirportID = as.integer(as.vector(sourceAirportID)))

arrivals <- routes %>%
  dplyr::group_by(destinationAirportID) %>%
  dplyr::summarize(flights = n()) %>%
  mutate(destinationAirportID = as.integer(as.vector(destinationAirportID)))

#  Merge each of the arrivals/departures data.frames with the airports data.frame above
airportD <- left_join(airports, departures, by = c("ID" = "sourceAirportID"))
airportA <- left_join(airports, arrivals, by = c("ID" = "destinationAirportID"))

map <- get_map(location = 'United States', zoom = 4)

mapPoints <- ggmap(map) +
    geom_point(aes(x = lon, y = lat, size = flights), 
        data = airportA) + 
    ggtitle("Location of Airports Sized by Number of Arriving Flights")

#  Add a custom legend to the plot
mapPointsLegend <- mapPoints +
  scale_size_area(breaks = c(10, 50, 100, 500, 900), 
                  labels =c(10, 50, 100, 500, 900), 
                  name = "Number of Arriving Routes")
mapPointsLegend

my_airport_code <- "LAX"
lax_routes <- dplyr::filter(routes, 
                           sourceAirport == my_airport_code | 
                            destinationAirport == my_airport_code)

lax_airport <- lax_routes %>%
    left_join(airports, by = c("sourceAirport" = "IATA_FAA")) %>%
    dplyr::select(destinationAirport, lat, lon, timezone) %>%
    dplyr::rename(source_lat = lat, source_lon = lon, source_timezone = timezone) %>%
    left_join(airports, by = c("destinationAirport" = "IATA_FAA")) %>%
    dplyr::select(source_lat, source_lon, source_timezone, lat, lon, timezone) %>%
    dplyr::rename(dest_lat = lat, dest_lon = lon, dest_timezone = timezone)

ggmap(map) + 
    geom_segment(aes(x = source_lon, y = source_lat, xend = dest_lon,
                   yend = dest_lat), data = lax_airport, alpha=.15) +
    labs(x = "Longitude", y = "Latitude",
         title = "Flights To and From Los Angeles") +
    theme_void()

ggmap(map) + 
    geom_curve(aes(x = source_lon, y = source_lat, xend = dest_lon,
             yend = dest_lat), data = lax_airport, 
             arrow = arrow(length = unit(0.02, "npc")), alpha=.15) +
    labs(x = "Longitude", y = "Latitude",
         title = "Flights To and From Los Angeles") + 
  coord_cartesian()

lax_airport$change_timezone <- lax_airport$source_timezone - lax_airport$dest_timezone
lax_airport <- lax_airport[which(abs(lax_airport$change_timezone) <= 3), ]

lax_airport$change_timezone <- factor(lax_airport$change_timezone)
# Filter by routes that wil be shown on map

ggmap(map) + 
  geom_curve(aes(x = source_lon, y = source_lat, xend = dest_lon,
                   yend = dest_lat, color = change_timezone), data = lax_airport, 
             arrow = arrow(length = unit(0.02, "npc"))) +
    labs(x = "Longitude", y = "Latitude", 
         title = "Flights To and From Los Angeles",
         color = "Change in \nTime Zone \nin Hours") +
    coord_cartesian() + 
    scale_color_manual(values = my_colors)



Problem 5

(40 points)

Choropleth Maps of Rent Prices

rent <- read_csv("https://raw.githubusercontent.com/sventura/315-code-and-datasets/master/data/price.csv")
## Parsed with column specification:
## cols(
##   .default = col_integer(),
##   City = col_character(),
##   Metro = col_character(),
##   County = col_character(),
##   State = col_character()
## )
## See spec(...) for full column specifications.
rent_jan2017 <- rent %>%
  select(County, State, `January 2017`) %>%
  rename(jan_2017 = `January 2017`) %>%
  arrange(State) %>%
  group_by(State) %>%
  summarize(mean_rent = mean(jan_2017))

state_data <- data_frame(state.abb, state.name) %>%
  mutate(state.name = tolower(state.name)) %>%
  left_join(rent_jan2017, by = c("state.abb" = "State"))

state_borders <- map_data("state") %>%
  left_join(state_data, by = c("region" = "state.name"))
## 
##  # maps v3.1: updated 'world': all lakes moved to separate new #
##  # 'lakes' database. Type '?world' or 'news(package="maps")'.  #
## 
## 
## 
## Attaching package: 'maps'
## 
## The following object is masked from 'package:purrr':
## 
##     map
ggplot(state_borders, aes(x = long, y = lat, fill = mean_rent)) + 
  geom_polygon(aes(group = state.abb), color = "black") + 
  theme_void() + 
  coord_map("mercator") + 
  scale_fill_gradient2(high = "darkred", low = "darkblue", 
                       mid = "white", midpoint = 1500) +
  labs(title = "Mean Rent per State, Jan. 2017",
       fill = "Mean Rent ($)")

The highest average rent cost states seem to be California, Massachusetts, and New Jersey, while the lowest rent cost states include Oklahoma, West Virginia, and Arkansas. Average rent prices seem to be higher around the Pacific coast and in New England, while they tend to be lower in the Deep South, the Midwest, and the Mountain time zone.

rent_jan2017 <- rent %>%
  select(County, State, `January 2017`) %>%
  rename(jan_2017 = `January 2017`) %>%
  arrange(County) %>%
  group_by(County) %>%
  summarize(mean_rent = mean(jan_2017)) %>%
  mutate(County = tolower(County))

county_borders <- map_data("county") %>%
  left_join(rent_jan2017, by = c("subregion" = "County"))

ggplot(county_borders, aes(x = long, y = lat, fill = mean_rent)) + 
  geom_polygon(aes(group = group)) + 
  theme(plot.title = element_text(hjust = 0.5)) +
  coord_map("mercator") + 
  scale_fill_gradient2(high = "darkred", low = "darkblue", 
                       mid = "white", midpoint = 2000) +
  labs(title = "Mean Rent per County, Jan. 2017",
       fill = "Mean Rent ($)") +
  theme_bw()

THe higher end of the county average monthy rents seem to predominantly be on the coasts. The above average monthly rents are concentrated in a few counties in Californa, New York, New Jersey, and Florida. There also seem to be a few high-rent counties states in the Midwest. These are significantly higher than most of the country; the rest of the counties seems to have average monthly rent below 2000 dollars per month, while these counties seem to be 3000 dollars per month and above. There seems to be a lot of missing data from the Great Plains states.



Bonus Problems

See the BonusProblems assignment on Blackboard.